salary

This sub-chapter shows an analysis of salary for different occupations in New York City.

Overview of Salary Distribution

In order to have an overview of salary distribution according to different occupations in New York City, we draw a Cleveland Dot Plot to show the 10-year-average salary of different occupations first.

ggplot(NewSalary) +
  geom_point(aes(Salary_occupation, reorder(Occupations,Salary_occupation)),color = "royalblue3", size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') + 
  ggtitle('Average Salaries of Different Occupations in NYC') +
  scale_x_continuous(labels = ks)+
  mytheme

Observations on Average Salaries by Occupation in NYC:

  1. Huge Differences in Salary for Different Occupations.
  • There is a huge difference in salary for different kinds of occupations. The range is up to 72947, which is about 3 times of minimum salary.
  1. Five clusters for salary distribution.
  • The first group is Legal, which has much higher average salary compare with all other occupations.
  • The second group includes 7 different occupations, which ranks second in salary levels among all occupations. We can also divide the second group into four sub-groups according to the salary.
    • The first sub-group only includes Health diagnosing and treating practitioners and other technical, which has the second largest salary.
    • The second sub-group includes Computer and mathematical and Management. They have very similar average salary, the difference in salary of the two occupations is only 810.
    • The third sub-group includes Law enforcement workers, Business and financial operations and Architecture and engineering. The salary range of this sub-group is also small, it is NA.
    • The fourth sub-group contains only one occupation, which is Life, physical, and social science
  • The third group contains Arts, design, entertainment, sports, and media, Health technologists and technicians, Education, training, and library, Installation, maintenance, and repair, Community and social service and Construction and extraction. The salary range of this group is 7128.
  • The Fourth group contains Office and administrative support, Sales and related, Transportation and Fire fighting and prevention. The salary range of this group is 3932.
  • The Fifth group contains Production, Healthcare support, Building and grounds cleaning and maintenance, Material moving, Personal care and service, Farming, fishing, and forestry, Food preparation and serving related. The salary range of this group is 8016.
  1. Gaps between the five clusters.
  • From the first group to the fifth group, the salary gap between groups is 11623, 13564, 6048, 11989.
  • Among all the gaps, the biggest one is between group 2 and group 3, and the gap is 13564. The smallest gap is between gtoup 3 and group 4, and the gap is 6048.
  1. Top 3 and Last 3 Occupations in Salaries.
  • Top 3
    • Legal
    • Health diagnosing and treating practitioners and other technical
    • Computer and mathematical
  • Last 3
    • Personal care and service
    • Farming, fishing, and forestry
    • Food preparation and serving related
  • As can be seen from the plot, the salary range for the top 3 occupations are large. However, for the last 3 occupations, the salary range is relatively small.

Analyze on Salary by Years

One of the things that affects salary is time. So first, we have an analysis on salaries in different years.

General Trend of Salary by Years

NewSalary1 <- NewSalary[NewSalary$year == "year2010" | NewSalary$year == "year2015" | NewSalary$year == "year2019",]
NewSalary1$year<- plyr::revalue(NewSalary1$year, c("year2010" = "2010","year2015"="2015","year2019"="2019"))

ggplot(NewSalary1) +
  geom_point(aes(Salary_year, reorder(Occupations,Salary_avg), color = year),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') + 
  ggtitle('Salaries of Different Occupations of Different Times in NYC')+
  scale_x_continuous(labels = ks)+
  mytheme

Observations on Salaries by Occupation NYC through 2010-2019:

  1. Salary Trends for Top 3 and Last 3 Occupations

Top 3

1.) Legal occupations

This occupation is in an monotonous increasing trend in salary, and the increasing speed is also becoming faster.

2.) Health diagnosing and treating practitioners and other technical occupations

This occupation also has a monotonous increasing trend in salary.

3.) Computer and mathematical occupations

This occupation also has a monotonous increasing trend in salary.

Last 3

1.) Food preparation and serving related occupations

This occupation has the lowest salary within the year range in 2010. However, its salary is in an increasing trend by years.

2.) Farming, fishing, and forestry occupations

The salary for this occupation decreased first and then increased. However, it still did not reach the salary level in 2019 as it was in 2010.

3.) Personal care and service occupations

The salary trend for this occupation also decreased first and then increased. Different from the occupation of farming, fishing, and forestry occupations, the salary only dropped a little bit first and then increased a lot. Therefore, generally speaking, the salary of this occupation increased.

In order to see the salary variances of the 25 occupations in detail, we draw boxplot for comparisons.

ggplot(NewSalaryWithVariance) +
  geom_boxplot(aes(y = reorder(x = Occupations, Salary_YearlyAvg, FUN = median), x = Salary_YearlyAvg),
               color = "black", fill = "dark red", alpha = 0.7) + 
  scale_x_continuous(labels = ks)+
  ggtitle("Boxplots with Salaries for Different Occupations") +
  xlab("Salary")+
  ylab("Occupations")

  mytheme2
## List of 5
##  $ axis.title  :List of 11
##   ..$ family       : NULL
##   ..$ face         : chr "bold"
##   ..$ colour       : NULL
##   ..$ size         : num 12
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text   :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 10
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.text :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 10
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ legend.title:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 12
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ plot.title  :List of 11
##   ..$ family       : NULL
##   ..$ face         : chr "bold"
##   ..$ colour       : NULL
##   ..$ size         : num 15
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE

Observations from Boxplot of salaries per year:

Salaries that seem prone to dramatic fluctuation over time:

1.) Legal occupations

2.) Computer and Mathematical occupations

3.) Architecture and engineering occupations

4.) Life, physical, and social science

5.) Farming, fishing, and forestry occupations

Salaries where the number of employees has fluctuated very little:

1.) Law enforcement

2.) Health technologists and technicians

3.) Installation, maintenance, and repair

4.) Community and social service

5.) Healthcare support

6.) Material moving

When reading this plot is that the salaries in any particular occupation may change the viewers perception as to what constitutes more variation. To address this, a second plot has been created in which the box plots are normalized by dividing the salaries in each year by the mean across this sector. While the variation won’t be apparent, the relative degrees of fluctuation will become more apparent.

NewSalaryWithVariance2 <- NewSalaryWithVariance %>% select(1,2,5)%>%unique()
NewSalaryWithVariance2$Normalized<-NewSalaryWithVariance2$Salary_YearlyAvg/NewSalaryWithVariance2$Salary_occupation
  
ggplot(NewSalaryWithVariance2) +
  geom_boxplot(aes(x = Normalized, y = reorder(x = Occupations,Normalized, FUN = median)),
               color = "black", fill = "dark red", alpha = 0.7) + 
  ggtitle("Normalized salary in sector per year") +
  xlab("Salary") + ylab("Occupations") +
  mytheme1

Observations from salaries in sector per year:

In this plot, the total values of employees working in each field in each year are normalized by their averages. There are something special you can see from this normalized boxplot.

1.) Farming, fishing and forestry ranks first in variation in salaries. This makes sense because this occupation has relatively low salaries, so when normalized, it is more likely to show high variation in salary by years.

2.) Construction and extraction also has huge variance, even when the average salary for this occupation is not very low. There are two outliers in this sector, in 2010, it has the lowest salary which is 26284, and in 2018, it has the highest salary which is 56175. Without the two data points, the variance will be much smaller. There might be more buildings under construction in 2018 and much fewer in 2010.

3.) Healthcare support and Law enforcement workers are the two most stable sectors in salaries. Among all the occupations, Healthcare support and Law enforcement workers has the smallest variance after normalized. Of all the occupation types, it seems that law enforcement occupations have the most stable salaries. This might be because the salaries for these kinds of jobs are set by the local government, and government jobs tend to have stable and consistent pay.

Percentage Difference by Years

It is also very important to analyze on the variations of salaries of different occupations. Because different occupations have different base wages, sometimes it might be more meaningful to calculate the percentage of wage fluctuations in wages. Here, we use the average wages to represent the wage of different occupations.

# draw cleveland dot plot according to variance

NewSalaryWithVariance$year <- as.factor(NewSalaryWithVariance$year)
NewSalaryWithVariance$Occupations <- as.factor(NewSalaryWithVariance$Occupations)

NewSalaryWithVariance1 <- NewSalaryWithVariance[NewSalaryWithVariance[, "year"] == "year2010",]
## Warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0.
## Convert to a vector.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
NewSalaryWithVariance1 <- NewSalaryWithVariance1 %>% 
  mutate(variance_pct = (variance/Salary_YearlyAvg))

NewSalaryWithVariance1 <- NewSalaryWithVariance1
NewSalaryWithVariance1 <- NewSalaryWithVariance1%>%select(1,6) %>% unique()

ggplot(NewSalaryWithVariance1,aes(x=fct_reorder(Occupations, abs(variance_pct)), y = variance_pct)) +
  geom_col(fill = "royalblue3",alpha = 0.75)+
  coord_flip()+
    theme(axis.text=element_text(),
      axis.title=element_text(face="bold"),
      plot.title = element_text(face = "bold"))+ xlab('Occupations') + ylab('Percentage Difference') + ggtitle('Percentage Difference of Salaries over Years')+
  mytheme

Observations from percentage difference of salary via gender:

As we can see from the above plot, we can discover that the majority of these occupations have increased in salaries in the past decade. Only two of these categories have decreased in salaries. Among all occupations, the occupation of Construction and extraction occupations has the biggest Percentage Difference in salary in 2010 and 2019, and the occupation of Healthcare support occupations have the smallest.

Analyze on Salary by Counties

ggplot(NewSalary) +
  geom_point(aes(Salary_county, reorder(Occupations,Salary_county), color = Boroughs),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') + 
  scale_x_continuous(labels = ks)+
  ggtitle('Salaries of Different Occupations of Different Counties in NYC')+
  mytheme

Observations on Salaries by Occupation NYC for different counties:

As can be seen in this plot, for different occupations, the counties with the highest and lowest wages in each occupation are different. For the majority of the occupations, the highest salaries are in New York County and their lowest salaries are in Bronx County. For the relatively low-paid occupations, the highest salaries are in Richmond County.

Distribution of the Highest and Lowest Wages in Different Counties

We draw a dodged bar chart to reflect the specific distribution data of the highest and lowest wages in different counties.

## `summarise()` regrouping output by 'Boroughs' (override with `.groups` argument)


Observations on the Distribution of the Highest and Lowest wages in Different Counties:

As can be seen from this plot, many occupations have the lowest salaries in Bronx County, some of them are in Kings County and New York County, but none of them appears in Queens County and Richmond County.

For the occupations with highest salaries, the majority of them are in New York County and Richmond County. Several of them also appear in Queens County, but none of them appear in Bronx County and Kings County.

To see whether there are changes of the distribution of highest and lowest wages in different countries, we draw a stacked bar chart by years. We use different colors to represent different counties.

## `summarise()` regrouping output by 'index', 'year' (override with `.groups` argument)

Observations on the Probability of having the Highest/Lowest Salaries in Different Counties:

As can be seen in this plot, from the perspective of each year alone, the situation is slightly different from the overall average, which is reflected in the following aspects.

  1. For the highest salary

In the overall trend, maximum salary for all occupations do not lie in Bronx County and Kings County. However, as can be seen from the stacked bar chart, in year 2010, 2012, 2013 and 2014, there are some occupations with highest salary in Bronx County. Also, except for year 2014 and year 2017, there are some occupations with highest salary in Kings County.

  1. For the lowest salary

In the overall trend, minimum salary for all occupations do not lie in Queens County and Richmond County. However, as can be seen from the stacked bar chart, except for year 2013, there are some occupations with lowest salary in Queens County. Also, except for year 2018, there are some occupations with lowest salary in Richmond County.

  1. Percentage Difference in salaries between different counties

We also discover that the variations among different boroughs for different occupations are different. Therefore, we use a bar chart to order the degree of variance among different boroughs for all types of occupations. For each occupation, we use the salary in five counties to minus the smallest salary, add them up and divide the sum by 5. Then, we divide the value by the smallest salary to represent the variance of each occupation.

Percentage Difference of Salaries in Different Counties

CountySalary3 <- CountySalary %>% select(1:5)
CountySalary3$variance <- with(CountySalary3, Salary_county - MinValue)

CountySalary3 <- CountySalary3 %>% 
  group_by(Occupations) %>% 
  mutate(variance_pct = (sum(variance)/5)/Salary_occupation) %>%
  ungroup() %>% select(1,7) %>% unique()

ggplot(CountySalary3,aes(x=fct_reorder(Occupations, variance_pct), y = variance_pct)) +
  geom_col(fill ="royalblue3",alpha = 0.75)+
  ylab('Percentage Difference') + xlab('Occupations') +
  coord_flip()+ 
  ggtitle('Percentage Difference of Salaries in Different Counties') +
  mytheme

Observations on Percentage Difference of Salaries in Different Counties:

  1. Top 5 in Variation

1.) Sales and related occupation

2.) Legal occupations

3.) Management occupations

4.) Farming, fishing, and forestry occupations

5.) Arts, design, entertainment, sports, and media occupations

  1. Last 5 in Variation

1.) Personal care and service occupations

2.) Health technologists and technicians

3.) Community and social service occupations

4.) Food preparation and serving related occupations

5.) Life, physical, and social science occupations

Analyze on Salary by Genders

ggplot(NewSalary) +
  geom_point(aes(Salary_gender, reorder(Occupations,Salary_gender), color = Gender),size = 2, alpha = 0.75) + ylab('Occupation') + xlab('Salary') + 
  scale_x_continuous(labels = ks)+
  ggtitle('Salaries of Different Occupations of Different Genders in NYC') +
  scale_color_manual(values=c('seagreen3','mediumorchid'))+
  mytheme

Observations on the Salaries of Different Occupations of Different Genders in NYC:

As can be seen in this Cleveland dot plot, the salaries of some occupations varies a lot between different genders, while some other occupations have similar salaries for two genders. Also, for some kinds of occupations, male have higher salaries and for other kinds of occupations, woman have higher salaries. To have a deeper understanding of these characteristics, we have a deeper analysis on salaries for different genders in different occupations.

Percentage Difference of Salaries between Genders

We use a bar chart to order the salary variance between genders for different occupations. To quantify the difference, we divide the income difference between male and female by the average salary of the occupation.

GenderSalary <- NewSalary %>% select(1, 2, 10,13)
GenderSalary <- unique(GenderSalary)

GenderSalary <- pivot_wider(GenderSalary, names_from = "Gender", values_from = "Salary_gender")
GenderSalary$variance <- with(GenderSalary, Male - Female)

GenderSalary <- GenderSalary %>% 
  mutate(variance_pct = variance/Salary_occupation)

ggplot(GenderSalary,aes(x=fct_reorder(Occupations, abs(variance_pct)), y = variance_pct)) +
  geom_col(fill = "royalblue3",alpha = 0.75)+
  coord_flip()+
    theme(axis.text=element_text(),
      axis.title=element_text(face="bold"),
      plot.title = element_text(face = "bold"))+ xlab('Percentage Difference') + ylab('Occupations') + ggtitle('Percentage Difference of Salaries for Different Genders')+
  mytheme

Observations on the Percentage Difference of Salaries for Different Genders:

From the horizontal bar chart above, we discover the following characteristics.

For most of the occupations, male employees have higher salaries than female employees. Female employees only have higher salaries in 4 kinds of occupations among the 25 kinds of occupations, namely, Construction and extraction occupations, Installation, maintenance, and repair occupations, Community and social service occupations, and Transportation occupations.

Top 5 and Last 5 in Variance
  1. Top 5

1.) Sales and related occupations

2.) Building and grounds cleaning and maintenance occupations

3.) Material moving occupations

4.) Production occupations

5.)Personal care and service occupations

  1. Last 5

1.) Transportation occupations

2.) Office and administrative support occupations

3.) Community and social service occupations

4.) Installation, maintenance, and repair occupations

5.) Computer and mathematical occupations

####Relation between Percentage Differenct of Employment and Salary in Gender

Intuitively, the gender composition of employees in a profession is related to the level of wages for gender. We want to analyze if this intuition makes sense. Therefore, we use two categorical variables to represent the two characteristics, namely “Gender Distribution” and “Salary Distribution”. For the category of “Gender Distribution”, there are two values, Male-dominated, which means there are more male employees in this occupation than female employees, and Female-dominated, which means there are more female employees in this occupation than male employees. For the category of “Salary Distribution”, we also set two values, Male-higher, which means male employees have higher salary in this occupation, and Female-higher, which means female employees have higher salary in this occupation. Then, we draw a mosaic plot to measure the relation.

From this mosaic plot, we can see that salary distribution is related to gender composition. However, the characteristic of this connection is against tuition. We tend to think that in the “Female-higher” salary distribution group, there will be more female-dominated occupations, and in the “Male-higher” salary distribution group, there will be more male-dominated occupations. However, the conclusion from the plot is opposite against our tuition.